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BACKGROUND AND HISTORY OF PESTICIDAL 
CRYSTAL PROTEIN NOMENCLATURE 

Since the first cloning of an insecticidal crystal protein gene 
from Bacillus thuringiensis (91), many other such genes have 
been isolated. Initially, each newly characterized gene or pro- 
tein received an arbitrary designation from its discoverers: icp 
(64); cry (21, 121); kurhdl (31); Bta (88); btl, bt2, etc. (40); 
type B and type C (43); and 4.5 kb, 5.3 kb, and 6.6 kb (55). The 
first systematic attempt to organize the genetic nomenclature 
relied on the insecticidal activities of crystal proteins for the 
primary ranking of their corresponding genes (44), The cryl 
genes encoded proteins toxic to lepidopterans; cryll genes en- 
coded proteins toxic to both lepidopterans and dipterans; crylll 
genes encoded proteins toxic to coleopterans; and crylV genes 
encoded proteins toxic to dipterans alone. 

This system provided a useful framework for classifying the 
ever-expanding set of known genes. Inconsistencies existed in 
the original scheme, however, due to attempts to accommo- 
date genes that were highly homologous to known genes but 
did not encode a toxin with a similar insecticidal spectrum. The 
cryllB gene, for example, received a place in the lepidopteran- 
dipteran class with cryllA, even though toxicity against dipter- 
ans could not be demonstrated for the toxin designated 
CryllB. Other anomalies arose after the nomenclature was 
established. The protein named CrylC, for example, was re- 
ported to be toxic to both dipterans and lepidopterans (103), 
while the protein designated CrylB was reported to be toxic to 
both lepidopterans and coleopterans (8). Because the nomen- 
clature system provided no central committee or database to 
maintain standardization, new genes encoding a diverse set of 
proteins without a common insecticidal activity each received 
the name cryV, based on the next available Roman numeral 
(32, 46, 67, 100, 102, 108). 

PROPOSED NOMENCLATURE 

We propose in this review a revised nomenclature for the cry 
and cyt genes. To organize the wealth of data produced by 
genomic sequencing efforts, a new nomenclatural paradigm is 
emerging, exemplified by the internationally recognized cyto- 
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chrome P-450 superfamily nomenclature system (68a, 122a). 
Our proposal conforms closely to this model both in concep- 
tual basis and in nomenclature format. The underlying basis of 
this type of system is to assign names to members of gene 
superfamilies according to their degree of evolutionary diver- 
gence as estimated by phylogenetic tree algorithms. The no- 
menclature format in such a system is designed to convey rich 
informational content about these relationships by appending 
to the mnemonic root a series of numerals and letters assigned 
in a hierarchical fashion to indicate degrees of phylogenetic 
divergence. This change from a function-based to a sequence- 
based nomenclature allows closely related toxins to be ranked 
together and removes the necessity for researchers to bioassay 
each new protein against a growing series of organisms before 
assigning it a name. 

In our proposed revision, Roman numerals have been ex- 
changed for Arabic numerals in the primary rank (e.g., 
Cryl Aa) to better accommodate the large number of expected 
new proteins. The mnemonic Cyt to designate crystal proteins 
showing a general cytolytic activity in vitro has been retained 
because of its historical precedent and entrenchment in the 
research literature. Our definition of a Cry protein is rather 
broad: a parasporal inclusion (crystal) protein from B. thurin- 
giensis that exhibits some experimentally verifiable toxic effect 
to a target organism, or any protein that has obvious sequence 
similarity to a known Cry protein. Similarly, Cyt denotes a 
parasporal inclusion (crystal) protein from B. thuringiensis that 
exhibits hemolytic activity, or any protein that has obvious 
sequence similarity to a known Cyt protein. By these criteria, 
the nontoxic 40-kDa crystal protein from B. thuringiensis subsp. 
thompsoni, for example, has been excluded from our list, but 
the lepidopteran-active 34-kDa protein (now Cryl5A) en- 
coded by an adjacent gene has been included (11). 

The freely available software applications CLUSTAL W 
(110) and PHYLIP (27) define the sequence relationships 
among the toxins to form the framework of the new nomen- 
clature. In the first step, CLUSTAL W aligns the deduced 
amino acid sequences of the full-length toxins and produces a 
distance matrix, quantitating the sequence similarities among 
the set of toxins. CLUSTAL W default settings are employed, 
except that the "delay divergent sequences" setting in the mul- 
tiple-alignment parameter menu is reduced from 40 to 0%. 
The NEIGHBOR application within the PHYLIP package 
then constructs a phylogenetic tree from the distance matrix by 
an unweighted pair-group method using arithmetic averages 
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(UPGMA) algorithm. The TREEVIEW application (73), with 
the "phylogenetic tree" and "ladderize left" options selected, 
produces a graphic presentation of the resulting tree. 

We have applied this procedure to the set of holotype se- 
quences given in Table 1 to produce the phylogenetic tree 
presented in Fig. 1. Vertical lines drawn through the tree show 
the boundaries used to define the various nomenclatural ranks. 
The name given to any particular toxin depends on the location 
of the node where the toxin enters the tree relative to these 
boundaries. A new toxin that joins the tree to the left of the 
leftmost boundary will be assigned a new primary rank (an 
Arabic number). A toxin that enters the tree between the left 
and central boundaries will be assigned a new secondary rank 
(an uppercase letter). It will have the same primary rank as the 
other toxins within that cluster. A toxin that enters the tree 
between the central and right boundaries will be assigned a 
new tertiary rank (a lowercase letter). Finally, a toxin that joins 
the tree to the right of the rightmost boundary will be assigned 
a new quaternary rank (another Arabic number). Toxins with 
identical sequences but isolated independently will receive sep- 
arate quaternary ranks. 

By this method each toxin will be assigned a unique name 
incorporating all four ranks, A completely novel toxin would 
currently be assigned the name Cry23Aal. For the sake of 
convenience, however, we propose that the inclusion of the 
tertiary rank a and quaternary rank 1 be optional, their use 
dictated only by a need for clarity. This new toxin could there- 
fore simply be referred to as Cry23A. 

In choosing locations for rank boundaries, we attempted to 
construct a nomenclature reflecting significant evolutionary 
relationships while at the same time minimizing changes from 
the gene names assigned under the old system. In the resulting 
system, proteins with a common primary rank are similar 
enough that the percent identity can be defined with some 
confidence. Proteins with the same primary rank often affect 
the same order of insect; those with different secondary and 
tertiary ranks may have altered potency and targeting within an 
order. At the tertiary rank, differences can be due to the ac- 
cumulation of dispersed point mutations, but often they appear 
to have resulted from ancestral recombination events between 
genes differing at a lower rank level (9). The quaternary rank 
was established to group "alleles" of genes coding for known 
toxins that differ only slightly, either because of a few muta- 
tional changes or an imprecision in sequencing. To avoid con- 
fusion, however, the reader should bear in mind the differences 
between the quaternary rank number and the classical concept 
of the allele. Any cry gene specified with a quaternary rank is 
a natural isolate. No assumption about functionality is implied 
by the presence of this rank number in the gene name. In 
contrast, an allele number would be assumed, unless paren- 
thetical or subscripted information indicated otherwise, to de- 
note a nonfunctional mutant form of a wild-type gene found at 
a discrete genetic locus. Because of the somewhat modular 
nature of the Cry proteins and the effect that various segmental 
relationships could have on the clustering algorithm, it is likely 
that these boundaries will move slightly or even bend as the 
addition of new sequences changes the topology of the phylo- 
genetic tree. Currently the boundaries represent approxi- 
mately 95, 78, and 45% sequence identity. 

A B. thuringiensis Pesticidal Crystal Protein Nomenclature 
Committee, consisting of the authors of this paper, will remain 
as a standing committee of the Bacillus Genetic Stock Center 
(BGSC) to assist workers in the field of B. thuringiensis genetics 
in assigning names to new Cry and Cyt toxins. The correspond- 
ing gene or protein sequences must first be deposited into a 
publicly accessible database (GenBank, EMBL, or PIR) and 



released by the repository for electronic publication in the 
database so that the scientific community may conduct an 
independent analysis. Researchers should submit new se- 
quences directly to the BGSC director (D. R. Zeigler), either 
by electronic mail (zeigler.l@osu.edu) or on computer dis- 
kette. The director will analyze the amino acid sequence as 
described above and suggest the appropriate name, subject to 
the approval of the committee. The committee will periodically 
review the literature of the Cry and Cyt toxins and publish a 
comprehensive list. This list, alongside other relevant informa- 
tion, will also be available via the Internet at the following 
URL: http://www.biols.susx.ac.uk/Home/Neil_Crickmore/Bt/. 

The current list of cry and cyt genes (including quaternary 
ranks) is given in Table 1. New gene names are listed with their 
previous names, their GenBank accession numbers, and pub- 
lished references. The quaternary ranks were assigned in the 
order that the gene sequences were discovered in the literature 
or submitted to the committee. Genes assigned the quaternary 
rank 1 represent holotype sequences. 

The boundaries shown in Fig. 1 allow most cry genes to 
retain the names they received under the system of Hofte and 
Whiteley (44), after a substitution of Arabic for Roman nu- 
merals. There are a few notable exceptions: crylG becomes 
cry9A, cryHIC becomes cry7Aa, cryHID becomes cry3C> crylVC 
becomes cry 1 OA, crylVD becomes cryllA, cytA becomes cytlA, 
and cytB becomes cyt2A (Table 1). Under the revised system, 
the known Cry and Cyt proteins fall into 24 sets at the primary 
rank— Cytl, Cyt2, and Cryl through Cry22. 

ROBUSTNESS OF THE NOMENCLATURE 

The robustness of the current naming process was assessed 
by a number of additional analyses. The choice of clustering 
algorithm (unweighted pair-group method using arithmetic av- 
erages) was driven largely by the consistent location of a root 
and constant branch lengths, resulting in a common vertical 
alignment of sequence names and essentially allowing a "ruler 
across the tree" approach to naming. It has the drawback of 
imposing a common evolutionary clock on the clustering pro- 
cess, an assumption that cannot be assured. The distance met- 
ric related to percent identity (essentially 1 minus the fraction 
of identical residues of the total compared without gaps) is the 
one most commonly found as the output of sequence compar- 
ison programs, including CLUSTAL W. For phylogenetic anal- 
ysis, a more usual distance metric relates to the number of 
substitutions per site to convert one sequence to the other 
(e.g., DayholFs point accepted mutation [RAM]) and accounts 
for the possibility of multiple substitutions per site as the se- 
quences are more divergent. The latter method has the draw- 
back of being more computationally intensive, and, for very 
divergent sequences, requiring too large a value, resulting in 
numeric computation failures. They also differ in the way se- 
quences of unequal length are handled, with the percent iden- 
tity method typically ignoring excess sequence and the other 
methods assigning a penalty. This is particularly important for 
crystal proteins, since a number of them lack the C-terminal 
protoxin segments yet are quite related to some longer toxins 
in the N-terminal toxin segment; we feel that the stronger 
association of such relationships found by the percent identity 
method is preferred. 

To assess the effect of using the neighbor-joining method to 
generate an unrooted tree, CLUSTAL W routines were used 
to generate such a tree with 1,000 bootstraps of the sequence 
alignment we used for Fig. 1. When an appropriate outgroup 
was chosen, the resulting tree (not shown) resembled our Fig. 
1. The bootstrap values indicated that the tree thus generated 
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TABLE 1. Known cry and cyt gene sequences with revised nomenclature assignments 



Revised 
gene name 


Original gene or 
protein name 


Accession 
no. 


21 25-3 990 > 


Reference 


cry2Ab2 


cryllB 


X55416 


874-2775 


17 


cry2Acl 


cryllC 


X57252 


2125-3990 


124 


cry3Aal 


crylllA 


M22472 


25-1956 


39 


cry3Aa2 


crylllA 


J02978 


241-2172 


93 


cry3Aa3 


crylllA 


Y00420 


566-2497 


41 


cry3Aa4 


crylllA 


M30503 


201-2132 


65 


cry3Aa5 


crylllA 


M37207 


569-2500 


22 


cry3Aa6 


crylllA 


U 10985 


569-2500 


1 


cry3Bal 


crylllB2 


X17123 


25->1977 


101 


cry3Ba2 


cryll/B 


A07234 


342-2297 


85 


cry3Bbl 


crylllBb 


M89794 


202-2157 


24 


cry3Bb2 


crylllC(b) 


U31633 


144-2099 


23 


cry3Cal 


crylllD 


X59797 


232-2178 


59 


cry4Aal 


crylVA 


Y00423 


1-3540 


121 


cry4Aa2 


crylVA 


D00248 


393-3935 


95 


cry4Bal 


crylVB 


X07423 


157-3564 


16 


cry4Ba2 


crylVB 


X07082 


151-3558 


112 


cry4Ba3 


crylVB 


M20242 


526-3930 


125 


cry4Ba4 


crylVB 


D00247 


461-3865 


95 


crySAal 


cryVA{a) 


L07025 


1->4155 


102 


cry5Abl 


cryVA{b) 


L07026 


l->3867 


67 


cry 5 Ac 1 




134543 


l->3660 


76 


crySBal 


PS86Q3 


U 19725 


l->3735 


76 


cry6Aal 


cryVIA 


L07022 


1->1425 


68 


cry6Bal 


cryVlB 


L07024 


1->1185 


67 


crylAal 


crylllC 


M64478 


184-3597 


58 


cry7Abl 


crylllCib) 


U04367 


1->3414 


75 


cry7Ab2 


crylllCic) 


U04368 


1->3414 


75 


cry8Aal 


crylllE 


U04364 


1->3471 


29 


crySBal 


crylllG 


U04365 


l->3507 


66 


crySCal 


crylllF 


U04366 


1-3447 


70 


cry9Aal 


crylG 


X58120 


5807-9274 


104 


cry9Aa2 


crylG 


X58534 


385->3837 


32 


cry9Bal 


cryX 


X75019 


26-3488 


97 


cry9Cal 


crylH 


Z37527 


2096-5569 


57 


cry9Dal 


N141 


D85560 


47-3553 


4 


cry9Da2 




AF042733 


<1->1937 


122 


crylOAal 


crylVC 


M12662 


941-2965 


111 


cryllAal 


crylVD 


M31737 


41-1969 


21 


cryllAa2 


crylVD 


M22860 


< 1-235 


2 


cryllBal 


Jeg80 


X86902 


64-2238 


19 


cryllBbl 


94 kDa 


AF017416 




72 


cryl2Aal 


cryVB 


L07027 


1->3771 


67 


cryl3Aal 


cryVC 


L07023 


1-2409 


90 


cryl4Aal 


cryVD 


U13955 


1-3558 


77 


crylSAal 


34kDa 


M76442 


1036-2055 


11 


cryl6Aal 


cbmll 


X94146 


158-1996 


5 


cryHAal 


cbm72 


X99478 


12-1865 


5 


crylSAal 


cryBPl 


X99049 


743-2860 


126 


cryWAal 


Jeg65 


Y07603 


719-2662 


86 


cryI9Bal 




D88381 




87 


cry20Aal 


86kDa 


U82518 


60-2318 


61 


cry2\Aal 




132932 


1-3501 


74 


cry22Aal 




134547 


1-2169 


76 


cytlAal 


cytA 


X03182 


140-886 


HQ 

1 lo 


cytlAa2 


cytA 


X04338 


509-1255 


120 


cytlAa3 


cytA 


Y00135 


36-782 


26 


cytlAa4 


cytA 


M35968 


67-813 


30 


cytlAbl 


cytM 


X98793 


28-777 


109 


cytlBal 




U37196 


1-795 


78 


cyt2Aal 


cytB 


Z14147 


270-1046 


51 


cyt2Bal 


"cytB" 


U52043 


287-655 


35 


cyt2Bbl 




U82519 


416-1204 


15 



Revised Original gene or Accession 
gene name protein name no. 



Coding 
region" 



Reference 



crylAal 

crylAa2 

crylAa3 

crylAa4 

crylAaS 

crylAa6 

crylAbl 

crylAb2 

crylAb3 

crylAb4 

crylAbS 

crylAb6 

crylAbl 

crylAb8 

crylAb9 

crylAblO 

cry I Ac 1 

crylAc2 

crylAc3 

crylAc4 

crylAcS 

crylAc6 

crylAcl 

crylAcS 

crylAc9 

crylAclO 

crylAdl 

crylAel 

crylAfl 

crylBal 

crylBa2 

crylBbl 

crylBcl 

crylBdl 

crylCal 

crylCa2 

crylCa3 

crylCa4 

crylCaS 

crylCa6 

crylCa7 

crylCbl 

crylDal 

crylDbl 

crylEal 

crylEa2 

crylEa3 

crylEa4 

crylEbl 

cry I Fa I 

crylFa2 

crylFbl 

crylGal 

crylGa2 

crylGbl 

crylHal 

crylHbl 

cry 1 la! 

crylla2 

crylla3 

crylla4 

cryllaS 

cryllbl 

crylJal 

crylJbl 

crylKal 

crylAal 

crylAal 

cry2Aa3 

crylAbl 



crylA{a 
crylA(a 
crylA{a 
crylA(a 
crylA(a 
crylA(a 
crylAib 
crylA{b 
crylA(b 
crylA{b 
crylA(b 
crylA(b 
crylA(b 
crylAib 
crylA{b 
crylA(b 
crylA(c 
crylA(c 
crylA(c 
crylA(c 
crylA(c 
crylA(c 
crylA (c 
crylA(c 
crylA(c 

crylA(c 
crylA(e 
icp 
crylB 

ET5 

crylB{c) 

cryEl 

crylC 

crylC 

crylC 

crylC 

crylC 

crylC 

crylC 

crylC(b) 

crylD 

prtB 

crylE 

crylE 

crylE 

crylE(b) 

crylF 

crylF 

prtD 

prtA 

crylM 

cryHl 

prtC 

cryV 

cryV 

cryV 

cryV 

cryVl59 

cryV465 

ET4 

ET1 

cryllA 
cryllA 

cryllB 



Ml 1250 

M10917 

D00348 

X13535 

D17518 

U43605 

M 13898 

M12661 

M15271 

D00117 

X04698 

M37263 

X13233 

Ml 6463 

X54939 

A29125 

Ml 1068 

M35524 

X54159 

M73249 

M73248 

U43606 

U87793 

U87397 

U89872 

AJ002514 

M73250 

M65252 

U82003 

X06711 

X95704 

L32020 

Z46442 

U70726 

X07518 

X13620 

M73251 

A27642 

X96682 

X96683 

X96684 

M97880 

X54160 

Z22511 

X53985 

X56144 

M73252 

U94323 

M73253 

M63897 

M73254 

Z22512 

Z22510 

Y09326 

U70725 

Z22513 

U35780 

X62821 

M98544 

L36338 

L49391 

Y08920 

U07642 

L32019 

U31527 

U28801 

M31738 

M23723 

D86064 

M23724 



527-4054 
153->2955 
73-3600 
1-3528 
81-3608 

1->1860 
142-3606 

155- 3622 

156- 3620 
163-3627 
141-3605 

73-3537 
1-3465 

157- 3621 
73-3537 

b 

388-3921 
239-3769 
339->2192 
1-3534 
1-3531 
1->1821 
976-4509 
153-3686 
388-3921 
388-3921 
1-3537 
81-3623 
172->2905 

1-3684 
186-3869 
67-3753 
141-3839 

47-3613 
241->2711 

1-3570 
234-3800 
l->2268 
l->2268 
l->2268 
296-3823 
264-3758 
241-3720 
130-3642 
1-3513 
1-3513 
388-3900 
1-3522 
478-3999 
1-3525 
483-4004 
67-3564 
692-4210 

530-4045 
728-4195 
355-2511 
1-2157 
279-2435 

61-2217 
524-2680 
237-2393 

99-3519 
177-3686 
451-4098 
156-2054 
1840-3738 
2007-3911 
1-1899 



92 

98 

99 

62 

113 

63 

119 

111 

31 

50 

40 

37 

36 

69 

13 

28 

3 

117 

18 

84 

83 

63 

38 

71 

33 

107 

79 

60 

49 

10 

105 

25 

6 

12 

45 

88 

79 

114 

106 

106 

106 

48 

42 

56 

115 

7 

82 

47 

81 

14 

80 

56 

56 

96 

12 

56 

53 

108 

34 

100 

54 

94 

100 

25 

116 

52 

20 

123 

89 

123 



° The symbols < and > indicate that the coding region extends up- or downstream, respectively, from the known sequence data. 
b Only the polypeptide sequence has been reported. 
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Primary Rank 



Main Cry 
Lineage 




Cyt 
Lineage 

Outlying 

.- Cry 
Lineages 



Secondary Rank Tertiary Rank 

-CrylAb 
-CrylAe 
-CrylAf 
-CrylAa 
-CrylAd 
-CrylAc 
CrylFa 
•CrylFb 
-CrylGa 
CrylGb 
-CrylDa 

-| CrylDb 

4 CrylHa 

-CrylHb 

-CrylEa 
■CrylEb 
•CrylJa 
-CrylJb 
a -CrylCa 

-§ CrylCb 

| CrylBb 

I CrylBc 

| CrylBd 

1 CrylBa 

-f CrylKa 

■4 Crylla 

-3 Cryllb 

_s rl Cry7Aa 

1 4 Cry7Ab 

-f i Cry9Ca 

-| 1 Cry9Da 

-£ 1 Cry9Ba 

- Cry9Aa 

- Cry8Aa 
-a Cry8Ba 

-| 1 Cry8Ca 

J 1 Cry3Aa 

e 3 Cry3Ca 

- Cry3Ba 
-Cry3Bb 

- Cry4Aa 
-Cry4Ba 
-CrylOAa 
-Cryl9Aa 
-Cryl9Ba 
-Cry20Aa 
-Cryl6Aa 
-Cryl7Aa 

- Cry5Aa 

- Cry5Ac 
-Cry5Ab 

Cry5Ba 
-Cryl2Aa 
-Cry21Aa 
-Cryl3Aa 
-Cryl4Aa 
-Cry2Aa 

- Cry2Ab 

- Cry2Ac 
-Cryl8Aa 
-CryllBa 

CryllBb 
CryllAa 
CyllAa 
CytlAb 
CytlBa 
Cyt2Ba 
Cyt2Bb 
Cyt2Aa 
CrylSAa 
Cry6Aa 
Cry6Ba 
Cry22Aa 



1 


! 1 


' 1 

^ 1 — 

i 1 i — 




\ Q i 






1 1 — ! 


i — ' — i- 


i , ' 
t ' i 





I 

10 



20 



I 

30 
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I 

70 



40 50 60 

Percent Amino Acid Sequence Identity 



I 

80 



I 

90 



FIG, 1. Phylogram demonstrating amino acid sequence identity among Cry and Cyt proteins. This phylogenetic tree is modified from a TREEVIEW visualization 
of NEIGHBOR treatment of a CLUSTAL W multiple alignment and distance matrix of the full-length toxin sequences, as described in the text. The gray vertical bars 
demarcate the four levels of nomenclature ranks. Based on the low percentage of identical residues and the absence of any conserved sequence blocks in 
multiple-sequence alignments, the lower four lineages are not treated as part of the main toxin family, and their nodes have been replaced with dashed horizontal lines 
in this figure. 



had significant branch points deeper in the tree than the cho- 
sen primary rank in the nomenclature. This sort of analysis was 
rejected as unsuitable for the purposes of Cry nomenclature 
due to the generally ragged branch lengths it produced and the 
requirement for the careful choice of an outgroup. 
An alternative method of clustering protein sequences, ca- 



pable of handling sequences that are quite diverse, is parsi- 
mony analysis. A consensus tree generated from 100 boot- 
straps of such an analysis displaces the two incomplete Cryl 
sequences (CrylBd and CrylAf) and the two Cryl sequences 
lacking the C-terminal protoxin segments (Crylla and Cryllb) 
into a region of the tree populated with such shortened se- 
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quences (not shown). With the further exceptions of Cryl2A 
being interjected into the Cry5 cluster and a number of se- 
quences besides Cry6B clustering higher in the tree than 
Cry6A, the proposed nomenclature successfully reflects the 
grouping of sequences provided by this method of analysis as 
well. 

As noted above, the usual distance metrics for phylogenetic 
analysis account for multiple substitutions per site; most com- 
monly, the Dayhoff PAM metric is used. When this distance 
metric was applied to the alignment used to make Fig. 1, a 
large number of the sequence pairs were found to have infinite 
distance. Therefore, the main Cry lineage and the Cyt lineage 
were separately aligned, the distances were calculated, and the 
distance matrices were clustered by using the FITCH program 
(of the PHYLIP software package). This method of analysis 
revealed several strongly associated groups of sequences 
(>90% of trees) in the main Cry lineage that extend deeper 
into the tree than the primary rank assigned in the proposed 
nomenclature: Cryl; Cry3; Cry4; Cry7; the Cry5, Cryl2-Cryl3- 
Cryl4-Cry21 group; the Cry8-Cry9 group; the Cryl0-Cryl9 
group; the Cryl6-Cryl7 group; and the Cry2-Cryll-Cryl8 
group. Many of these groups, however, were separated by 
branch points that were either nonmajority or were found 
<60% of the time; thus, the arrangement of these groups 
would be likely to change with additional sequence additions. 
At the secondary rank, the only anomaly with respect to the 
proposed nomenclature was the interjection of the Crylla and 
Cryllb sequences into the CrylB group. This effect may be due 
to an artificially reduced distance between the Cryll sequences 
and the incomplete CrylBd sequence caused by the particular 
distance metric used. The Cyt lineage sequences were sepa- 
rated into the expected two primary rank groups that separate 
into the expected secondary rank groupings. This more stan- 
dard phylogenetic approach also suffers from an accentuated 
visual disorientation of uneven branch lengths and shortening 
of the more closely related branches, especially at the tertiary 
rank (lowercase letter), where a great deal of comparative 
work has been done among the Cryl toxins. 

In summary, the proposed nomenclature uses readily avail- 
able software that can be easily interpreted by investigators in 
the field and meets their needs as well as, or better than, 
alternative methods of analysis and presentation. When the 
holotype toxins were analyzed by alternative phylogenetic 
methods, the hierarchy implied by the nomenclature was es- 
sentially consistent with the resulting phylogenetic clustering, 
and the few exceptions were largely explainable by known 
properties of the sequences in question. 
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